Imputation methods for a binary variable

نویسنده

Seppo LAAKSONEN

چکیده

Binary variables are common in surveys including such as employed vs unemployed, healthy vs unhealthy or poor vs non-poor. The last one is used in the examples of this paper. It is unfortunate that survey data are never complete, that is, missingness occurs, sometimes severely. In those cases missingness may violate estimates due to unit nonresponse. For this purpose, there are also sophisticated nonresponse adjustments possible to use, and these should be applied in a best way taking advantage of auxiliary variables from the individual nonrespondents as well as from the population level aggregates. In the case when the nonrespondents participate partially in the survey (item nonresponse), there exists more micro level auxiliary variables available. Such data should naturally be exploited in a best way as well. This leads to apply imputation methods. In this paper, our binary variable gives a too low poverty rate when estimating it from the completely replied respondents. So it is beneficial to improve the estimate with imputations. This is not necessarily easy since our pattern of auxiliary data is not excellent that is common in real life. Consequently, the results depend more than in a nice situation on an imputation method applied. In this case we compare several methods. The imputation process consists of the two steps, (i) imputation model and (ii) imputation task. The dependent variable of the imputation model is also binary but they are of two types: (a) the binary variable being imputed itself, or (b) the binary response indicator. The former takes advantage of the data of the respondents but the latter both from the respondents and from the nonrespondents. Since we know the true values (but so that their mechanism is not known), we can compare different methods. When using the methods with random numbers, we also can apply multiple imputation methodology. So we have both single imputation and multiple imputation methods compared in the empirical part. Our strategy is not only Bayesian for multiple imputation that is usual in software packages such as SPSS and SAS. We test their methods but concentrate on our own solutions, called non-Bayesian.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

The Classical Linear Regression Model with one Incomplete Binary Variable

We present three di erent methods based on the conditional mean im putation when binary explanatory variables are incomplete Apart from the single imputation and multiple imputation especially the so called pi imputation is presented as a new procedure Seven procedures are com pared in a simulation experiment when missing data are con ned to one independent binary variable complete case analysi...

متن کامل

Recovery of information from multiple imputation: a simulation study

UNLABELLED BACKGROUND Multiple imputation is becoming increasingly popular for handling missing data. However, it is often implemented without adequate consideration of whether it offers any advantage over complete case analysis for the research question of interest, or whether potential gains may be offset by bias from a poorly fitting imputation model, particularly as the amount of missing...

متن کامل

112-30: Rounding after Multiple Imputation with Non-Binary Categorical Covariates

At some point in their careers many SAS users will confront a problem of missing data. A variety of statistical approaches have been developed to handle this problem. One of the most promising of these to emerge in the last several decades is multiple imputation. In SAS multiple imputation can now be performed in many contexts with the use of the MI and MIANALYZE procedures. To use these proced...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Imputation methods for a binary variable

نویسنده

چکیده

منابع مشابه

Influence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

The Classical Linear Regression Model with one Incomplete Binary Variable

Recovery of information from multiple imputation: a simulation study

112-30: Rounding after Multiple Imputation with Non-Binary Categorical Covariates

عنوان ژورنال:

اشتراک گذاری